Update aggregate_spatial process implementation #268

jzvolensky · 2024-09-03T09:15:17Z

Related issue: aggregate_spatial improvements EOEPCA/openeo-processes-dask#1

We have been investigating the performance of the aggregate_spatial with different datasets. Currently it is not doing very well.

We have discovered that certain methods such as iterate do not scale well at all. The exactextract method seems to be doing much better (see here https://gist.github.com/clausmichele/6d9bba3a82f39c8c91c0cf5d263e1521)

The current idea is that we make the exactextract method the default method. Comment from Michele:

"However, since it doesn't allow callbacks as a function to be applied to the data within the polygon, the code should select a different method if a callback/function has to be applied to the data."

So this will need to be taken into account.

Related issue: aggregate_spatial: crs of data and geometry mismatch undefined openeo-processes#499

The following improvement will be made: The geometries get reprojected to the data CRS and the resulting vector datacube has the CRS of the input data.

This is currently work in progress. Feel free to add some comments or suggestions!
Thanks.

cc @clausmichele

…into main

Feature/load stac odc

…g_spatial(temporary), added disclaimer to report.md with the most up to date instructions while the docs are being reworked

clausmichele · 2024-09-09T08:16:52Z

@jzvolensky the tests are failing, please try to fix them.

jzvolensky · 2024-09-19T07:50:21Z

Hi @clausmichele @ValentinaHutter,

So I ran into a bit of trouble trying to rework the aggregate_spatial process.

We have discussed the possibility of adding a context parameter which will allow the user to set which extraction method to use, so in case of xvec iterate or exactextract. Since we know that exactextract does not support all possible combinations of the statistical processes, we have to make some form of a distinction based on what is present in the process graph.

Now before dealing with that I tried to simply implement the context which changes the method. Here we run into an issue.

first we add the option to add context:

def aggregate_spatial(
    data: RasterCube,
    geometries,
    reducer: Union[str, Callable],
    chunk_size: int = 2,
    context: Optional[Dict[str, str]] = None,
) -> VectorCube:

then here is an example of how we can supply the method:

aggregate = s2_datacube.aggregate_spatial(geometries=polys, reducer="mean", context={"method": "exactextract"})

The problem is that in the real world the reducer is provided as this partial function object like this:

REDUCER: functools.partial(<function OpenEOProcessGraph._map_node_to_callable.<locals>.node_callable at 0x7fd3bfdf3250>, parent_callables=[])

Which is fine for the iterate method, however the exactextract does not support this and therefore you fail with

ValueError: functools.partial(<function OpenEOProcessGraph._map_node_to_callable.<locals>.node_callable at 0x7fd3bfdf3250>, parent_callables=[]) is not a valid aggregation.

I tried to dig in the reducer object but I was not able to find any relevant part we could extract and turn into string for exactextract to use as stats. Is there a way to get this information from the object?

Otherwise, we are not able to match the user provider reducer statistic with what is support by exactextract.

Let me know if you have any ideas. Ideally what we want to end up with is to have a list of processes which match the ones in openeo and exactextract such as mean,min,max etc. and based on what is provided we select the method for better performance.

clausmichele · 2024-09-19T09:07:36Z

@jzvolensky you can get it in this way:

    _process = partial(
        process_registry["mean"].implementation,
        ignore_nodata=True,
        data=ParameterReference(from_parameter="data"),
    )
    print(_process.__repr__())

outputs

functools.partial(<function mean at 0x7f81a51a1cf0>, ignore_nodata=True, data=ParameterReference(from_parameter='data'))

jzvolensky · 2024-09-19T12:07:15Z

@jzvolensky you can get it in this way:

    _process = partial(
        process_registry["mean"].implementation,
        ignore_nodata=True,
        data=ParameterReference(from_parameter="data"),
    )
    print(_process.__repr__())

outputs

functools.partial(<function mean at 0x7f81a51a1cf0>, ignore_nodata=True, data=ParameterReference(from_parameter='data'))

This is great, and will probably be useful for the test. however in the implementation code it doesnt do anything differently than what I have been doing. Basically in the implementation we need to deconstruct the reducer object but

    print(f"REDUCER: {reducer}")

functools.partial(<function OpenEOProcessGraph._map_node_to_callable.<locals>.node_callable at 0x7fc62e859f30>, parent_callables=[])

    print(f"REDUCER REPR: {reducer.__repr__()}")

functools.partial(<function OpenEOProcessGraph._map_node_to_callable.<locals>.node_callable at 0x7fc62e859f30>, parent_callables=[])

which is unfortunate :(

clausmichele · 2024-10-01T14:34:13Z

@jzvolensky try with

import inspect
inspect.getsource(reducer.func)

Edit: after several trials I'm also stuck. Probably it's not possible, with the current openeo-pg-parser-networkx code, to reconstruct from a Callable the various functions that are being called in series? @GeraldIr @ValentinaHutter

clausmichele and others added 12 commits February 28, 2024 10:14

fix ressample_spatial dims

9df6f9c

Merge branch 'main' of github.com:clausmichele/openeo-processes-dask …

42809a2

…into main

Merge branch 'main' of github.com:clausmichele/openeo-processes-dask …

fb3c090

…into main

switch from stackstac to odc-stac

a062e06

auto apply scale and offset

87c4f9c

fix no data value for STAC

7371c06

fix no data value for STAC

73bcd0c

Merge branch 'Open-EO:main' into feature/load_stac_odc

2e3edac

first version of testing and some results. First draft of the report.

f665d23

Merge pull request #2 from interTwin-eu/feature/load_stac_odc

89f5fbd

Feature/load stac odc

Collected a bunch of scripts used for testing, memory profiling of ag…

a2bb11a

…g_spatial(temporary), added disclaimer to report.md with the most up to date instructions while the docs are being reworked

Merge remote-tracking branch 'upstream/main' into benchmark

f76ecc2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update aggregate_spatial process implementation #268

Update aggregate_spatial process implementation #268

jzvolensky commented Sep 3, 2024

clausmichele commented Sep 9, 2024

jzvolensky commented Sep 19, 2024

clausmichele commented Sep 19, 2024

jzvolensky commented Sep 19, 2024

clausmichele commented Oct 1, 2024 •

edited

Loading

Update aggregate_spatial process implementation #268

Are you sure you want to change the base?

Update aggregate_spatial process implementation #268

Conversation

jzvolensky commented Sep 3, 2024

clausmichele commented Sep 9, 2024

jzvolensky commented Sep 19, 2024

clausmichele commented Sep 19, 2024

jzvolensky commented Sep 19, 2024

clausmichele commented Oct 1, 2024 • edited Loading

clausmichele commented Oct 1, 2024 •

edited

Loading